TODO:

  • Create widget and deploy
  • Get more examples of not-poison-oak
  • Get better metric to track than accuracy (really care about maximizing recall, less about precision)
  • Clean data more

Imports and set Bing API Key

In [86]:
from fastai2.vision.all import *
from fastai2.vision.widgets import *
from utils import *
import shutil
key = 'bfe4c5c0c789435b9894b166b1ef6d6a'

Process Data

Load Data

In [122]:
path = Path('data'); path.ls()
Out[122]:
(#2) [Path('data/poison-oak'),Path('data/leaves')]
In [19]:
def search_images_bing(key, term, count, min_sz=128):
    client = api('https://api.cognitive.microsoft.com', auth(key))
    return L(client.images.search(query=term, count=count, min_height=min_sz, min_width=min_sz).value)
In [20]:
labels = ['poison-oak', 'leaves']
counts = [150, 450]
for label, count in zip(labels, counts):
    os.makedirs(path/label, exist_ok=True)
    res = search_images_bing(key, label, count)
    download_images(path/label, urls=res.attrgot('content_url'))
 Download of http://www.splintercat.org/PortlandHikers/PoisonOak02.jpg has failed after 5 retries
 Fix the download manually:
$ mkdir -p data/poison-oak
$ cd data/poison-oak
$ wget -c http://www.splintercat.org/PortlandHikers/PoisonOak02.jpg
$ tar xf PoisonOak02.jpg
 And re-run your code once the download is successful

In [21]:
(path/labels[0]).ls(), (path/labels[1]).ls()
Out[21]:
((#148) [Path('data/poison-oak/00000003.jpg'),Path('data/poison-oak/00000001.jpg'),Path('data/poison-oak/00000007.jpg'),Path('data/poison-oak/00000005.jpg'),Path('data/poison-oak/00000009.jpg'),Path('data/poison-oak/00000006.jpg'),Path('data/poison-oak/00000008.png'),Path('data/poison-oak/00000012.jpg'),Path('data/poison-oak/00000010.jpg'),Path('data/poison-oak/00000013.jpg')...],
 (#175) [Path('data/leaves/00000006.jpg'),Path('data/leaves/00000005.JPG'),Path('data/leaves/00000008.jpg'),Path('data/leaves/00000007.jpeg'),Path('data/leaves/00000002.jpg'),Path('data/leaves/00000010.jpg'),Path('data/leaves/00000003.jpg'),Path('data/leaves/00000011.jpg'),Path('data/leaves/00000001.jpg'),Path('data/leaves/00000012.JPG')...])

Verify images

In [124]:
verify_images(get_image_files(path))[-5:]
Out[124]:
(#0) []
In [25]:
bad = verify_images(get_image_files(path))
print(bad)
for img in bad: img.unlink()
/opt/conda/envs/fastai/lib/python3.7/site-packages/PIL/Image.py:2860: UserWarning: image file could not be identified because WEBP support not installed
  warnings.warn(message)
(#6) [Path('data/poison-oak/00000014.jpg'),Path('data/poison-oak/00000020.jpg'),Path('data/poison-oak/00000028.jpg'),Path('data/poison-oak/00000073.png'),Path('data/poison-oak/00000140.jpg'),Path('data/leaves/00000043.png')]
In [26]:
(path/'poison-oak').ls(), (path/'leaves').ls()
Out[26]:
((#143) [Path('data/poison-oak/00000003.jpg'),Path('data/poison-oak/00000001.jpg'),Path('data/poison-oak/00000007.jpg'),Path('data/poison-oak/00000005.jpg'),Path('data/poison-oak/00000009.jpg'),Path('data/poison-oak/00000006.jpg'),Path('data/poison-oak/00000008.png'),Path('data/poison-oak/00000012.jpg'),Path('data/poison-oak/00000010.jpg'),Path('data/poison-oak/00000013.jpg')...],
 (#174) [Path('data/leaves/00000006.jpg'),Path('data/leaves/00000005.JPG'),Path('data/leaves/00000008.jpg'),Path('data/leaves/00000007.jpeg'),Path('data/leaves/00000002.jpg'),Path('data/leaves/00000010.jpg'),Path('data/leaves/00000003.jpg'),Path('data/leaves/00000011.jpg'),Path('data/leaves/00000001.jpg'),Path('data/leaves/00000012.JPG')...])

Create DataBlock

In [125]:
data = DataBlock(blocks=(ImageBlock, CategoryBlock),
                 get_items=get_image_files,
                 get_y=parent_label,
                 splitter=RandomSplitter(valid_pct=0.2, seed=42))

Apply data augmentation

In [138]:
aug_data = data.new(item_tfms=RandomResizedCrop(224), batch_tfms=aug_transforms())

Create dataloader

In [139]:
dls = aug_data.dataloaders(path, bs=64)
In [144]:
dls.show_batch(max_n=5, nrows=1, unique=True)

Look at batch

In [140]:
xb, yb = first(dls.valid)
In [141]:
xb.shape, yb.shape
Out[141]:
(torch.Size([61, 3, 224, 224]), torch.Size([61]))
In [142]:
dls.show_batch(max_n=9, figsize=(9, 9))
/opt/conda/envs/fastai/lib/python3.7/site-packages/PIL/Image.py:932: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
  "Palette images with Transparency expressed in bytes should be "
In [143]:
len(dls.train_ds), len(dls.valid_ds)
Out[143]:
(245, 61)

Modeling

Create Model

In [145]:
learn = cnn_learner(dls, resnet34, metrics=accuracy)

Train model

In [146]:
learn.fine_tune(4)
epoch train_loss valid_loss accuracy time
0 1.121740 0.621949 0.721311 00:11
epoch train_loss valid_loss accuracy time
0 1.002119 0.893823 0.639344 00:10
1 0.936509 1.225770 0.622951 00:11
2 0.812722 0.785747 0.721311 00:11
3 0.718402 0.521533 0.803279 00:11
In [151]:
learn.fine_tune(4)
epoch train_loss valid_loss accuracy time
0 0.429001 0.362298 0.868852 00:10
epoch train_loss valid_loss accuracy time
0 0.330188 0.398394 0.868852 00:11
1 0.410319 0.333300 0.836066 00:11
2 0.399348 0.337829 0.852459 00:10
3 0.363547 0.336364 0.868852 00:11

Analysis

In [152]:
interp = ClassificationInterpretation.from_learner(learn)

Plot results

In [153]:
learn.show_results(max_n=9, figsize=(9, 9))

Confusion matrix

In [154]:
interp.plot_confusion_matrix()

Top Losses

In [155]:
interp.plot_top_losses(5, figsize=(15, 15))

Clean Dataset

Image Cleaner widget

In [66]:
cleaner = ImageClassifierCleaner(learn, height=256)
/opt/conda/envs/fastai/lib/python3.7/site-packages/PIL/Image.py:932: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
  "Palette images with Transparency expressed in bytes should be "
In [67]:
cleaner

Make sure to delete/relabel images before moving onto next class / dataset. Image cleaner's tracking resets!

Delete images

In [107]:
cleaner.delete()
Out[107]:
(#2) [0,11]
In [93]:
PILImage.create(cleaner.fns[3]).show(figsize=(15, 15))
Out[93]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbad81c8510>
In [108]:
for i in cleaner.delete(): cleaner.fns[i].unlink()

Relabel images

In [119]:
cleaner.change(), cleaner.fns[14]
Out[119]:
((#1) [(14, 'leaves')], Path('data/poison-oak/00000144.jpg'))
In [120]:
for i, new_lbl in cleaner.change(): shutil.move(str(cleaner.fns[i]), path/new_lbl/'relabeled.jpg')

Inference

In [158]:
img = PILImage.create('poison-oak.jpg')
img.to_thumb(440)
Out[158]:
In [159]:
learn.predict(img)
Out[159]:
('poison-oak', tensor(1), tensor([0.2376, 0.7624]))

Build Web App